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I. 


statement  and  History  of  the  Problem. 


The  computer  depth  perception  problem  is  the  derivation  of  the  distance  from  the  camera  to 
each  point  of  the  viewed  scene. 


Much  work  has  been  done  with  monocular  views  of  the  so-called  "blocks  world":  white  polyhedra 
Uhln  Here  the  picture  is  first  reduced  to  a li.  e drawing,  which  the  computer  perceives  in 
on  a dark  table.  ™re  t P aoDlving  task-specific  prior  knowledge  to  resolve  ambiguities, 

much  the  same  all  objects  are  assumed  to  be  supported 

tL'?abir(or^ther‘ Noc^  ip"  perception  problem  is  solved.  Gunnar  Grape  [Grape]  gives  a 
dicription  of  his  and  previous  approaches  to  this  rather  limited  task  domain. 


other  .ethohe  hev. 

'"°lSTeThl,“”e  limited  ut'ity  ih  e 8.ne,.l  en.lronerent,  while  the  letter  is  ,uite  syecessfril 

,r  near-field  work  (a  powerful  laser  is  required  to  extend  this  range). 


Si  I „•  a cinole  obiect  were  used  by  Bruce  Baumgart  [Baumgart]  to  derive  a model  of 

Si  the  sflhoSte  of  a toy  doll  or  horse  as  it  rotated  on  a turntable,  deriving  its 
in  Iricrion  of  cones".  This  is  an  inefficient  way  of  deriving  depth 

information,  in  ^ ° wLreas  it  is  possible  to  do  much  more  than  that  (vide  this  report).  Also,  this 
mXod%?esupposes  an  ability  to  determine  the  object-background  boundary,  a nontrivial  problem  in 
itself  under  normal  lighting  conditions. 


This  report  describes  a stereo  vision  approach  to  depth  perception!  the  author  has  built  upon  a 

eat  nf  nroo'ams  that  decompose  the  problem  in  the  following  way: 

^ f'  Production  of  a^amera  model:  the  position  and  orientation  of  the  cameras  in  3-space. 

2.  Lneration  of  matching  point-pairs:  loci  Of  corresponding  features  m the  two  pictures. 

3.  Computation  of  the  point  in  3-space  for  each  point-pair. 

4.  Presentation  of  the  resultant  depth  information. 


Sub-problem  1 has  been  adequately  solved  [Hannah]  for  high-quality  picture  pairs  with 
relatively  small  (<10X)  perspective  distortion  of  corresponding  objects. 


Hannah  also  attacked  sub-problem  2:  the  present  report  describes  several  refinements  on  her 
rh  tn  make  it  less  sensitive  to  oerspective  distortion,  less  dependent  on  hurnan  interaction,  and  a 
“ 'ccepuror  ’".cimg  pr«p.=tive  pplrt-pairp^  A group  at  JPL  tLevin.l  b.nAl.s  thia 
sub-problem  m a puilo  billerent  way;  the  methods  are  compared  in  section  III. 


Sub-problem  3 is  a trivial  exercise  in  trigonometry  and  linear  algebra,  given  the  camera  model 
generated  by  sub-problem  1. 


This  re«rt  describes  a program  . 

IrreeSnrnrr'VraX)  A 'Zl'VZ  -0  diLrImlnate  obiects  within  th,  scene  Is 
incorporated  in  this  program. 


■'TiS 
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II.  Conventions  and  Basic  Concepts  of  Stereo  Picture  Processing. 

A Picture  is  a two-dimensior^al  array  of  integer  values  which  represent  the  light  intensities  of  a 
scene  as  seen  through  some  camera,  at  a set  of  sample  points.  Several  parameters  are  of  irr, mediate 
nterest  such  as  the  imaging  geometry  of  the  camera,  the  number  and  spacing  of  the  sample  points 
spatial  ’resolution),  and  the  precision  to  which  the  light  intensities  are  recorded  (gray-scale  resolution) 
c r/Jort  ^ cameras  hLe  relatively  long  focal  lengths,  so  that  "pin-cushion  distortion  is  minimal 
pi  - u ion  ng"  s C 1 produced  by  a -fish-eye"  lens).  Due  to  space  limitahons  in  t e 

'pin  . resolution  is  limited  to  some  tens  of  thousands  of  sample  points  per  picture,  while 

g°raTscai’e  resoLtion  is  normalli  6 bits  (64  intensity  levels).  This  leads  to  an  image  quality  quite  similar 
to  that  of  a normal  television  picture. 

noints  ("pixels")  fall  on  a rectangular  grid;  Cartesian  coordinates  are  the  natural  choice.  In 
Keeping "iTh  t^e  conieTons  use^  in  the  television  industry,  pixels  are  identified  by  their  (1.J)  positions 
with  respect  to  the  upper  left-hand  corner  of  the  picture,  which  has  Pos'f'o^  (0,0).  The  I dimension 
Tncreases  to  the  right)  the  J-dimension  increases  downward.  lO  distinguish  between  P»^e  ^ in  he  two 
picturercomprising  a stereo  pair,  (lA.JA)  are  used  as  coordinate  labels  in  one  picture,  (IB,JB)  in  the 

other. 

matchine  is  the  process  of  finding  areas  in  tne  two  pictures  that  correspo  id  to  the  same 
3-D  piece  of  scr-ury  in  the  "real  world"  For  example,  the  area  around  (130,115)  in  the  'eft  picture  o 
L bar  “pair  (see  Section  IX)  matches  the  area  around  (15,115)  in  the  right  picture:  both  are  views  o 
the  fence^  post  in  the  foreground.  Intuitively,  one  area  matches  another  ,f  the  values  of 

-orresponding  pixels  are  nearly  equal.  Exact  pixel-by-pixel  equality  is  never  observed,  due  to  errors 
from  a number  of  sources.  First,  the  cameras  are  looking  at  this  piece  o scenery  from  different  points 
of  view,  thus  changing  its  apparent  shape  and  shading.  Second,  potential  matching  areas  in  the  two 
pictures  must  be  centered  on  actual  sample  points,  since  these  are  the  only  places  that  intensities  have 
Seen  observed:  interpolation  of  intensity  values  between  pixe  s is  slow  and  inaccurate.  Thus  a 
"matching  area"  is  merely  within  a pixel  of  the  correct  match,  and  the  observed  intensity  values  at  the 

• ^ taiw  rrirroenonHinc  oixels  are  nOt  expected  to  be  equal.  Finally,  the  cameras  are  far  from 
SSSf7crdif£ences  in  "gain",  "offset",  and  "noise"  are  to  be  expected.  A statistical  method  of  detecting 
^ f Matrhpc:  ic  tlearlv  indicated:  normalized  correlation  has  been  chosen  as  the  match  metric, 

match  between  two  areas,  where  >1  is  attainable  only  by  the  perfect  match  (two  areas  only  differing 
in  relative  gain  and  offset). 

The  term  "matching  point-pairs"  is  shor.hand  for  "the  pair  of  points  that  lie  at  the  center  of  a 
pair  of  matching  areas"  For  computational  reasons,  these  areas  are  rectangular  windows. 

Stereo  matching  is  not  an  infallible*  means  of  analysis.  "False  matches"  are  an  ever-present 
problem  arising  for  two  reasons.  The.e  may  be  multiple,  highly  correlating  matches  caused  by 
?epetit?on  of  features:  Imagine  trying  to  match  two  views  of  a freshly-painted  picket  fence  agains t a 
uniform  background.  Also,  the  discovery  of  a correct  match  may  be  blocked  by  the  occlusion  of  the 
scenery  seen  in  one  picture  by  a closer  object;  a spuriously  h'gh  correlating  point-pair  may  be 

selected  instead. 

More  reliable  matches  can  be  obtained  by  increasing  the  size  o'  the  correlation  v/indows,  thus 
producing  more  significant  correlations  (in  the  statistical  sense).  Unfortunctely.  large  windows  make  it 
^ tn  match  near  the  edges  of  objects:  the  windows  will  include  non-matching  background 

“th  window,  ore  to  p.r.pectiv,  distortion. 

Windows  containing  121  pixels  seem  to  be  a good  compromise. 
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III.  Generation  of  Matching  Point-Pairs. 


The  approach  described  herein  is  due  to  Marsha  Jo  Hannah  [Hannah],  who  in  turn  relied  on  he 
image  procesLg  groundwork  laid  at  Stanford  A1  by  Lynn  Quam,  Briefly  the  technique  ^fed  is  the 
exa^nation  of  a subset  of  all  possible  point-pairs  in  the  two  pictures,  makmg  the  decision  match  or 
"no  match"  for  each.  For  computational  speed  and  ease,  many  candidate  areas  in  picture  B are 
compared  against  a single  "target"  area  in  picture  A until  a match  is  found, 

An  exhaustive  examination  of  all  point-pairs  is  clearly  out  of  the  question,  when  the  pictures 
under  consideration  contain  tens  of  thousands  of  samp^  points  Thus,  various  search-reduction 
tarhniaues  have  been  developed.  The  most  important  of  these  is  the  continuity  assumptiori.  f (IB,JB) 
matches  (lA  JA)  then  the  match  for  (1A+D1,JA+DJ)  should  be  found  close  to  (1B+D1,JB+DJV,  and  if  (1B,JB)  is 
Tve^rpoor  ma  ch  for  (IA,JA)  then  there  is  not  much  hope  of  finding  a good  match  in  the  vicinity  of 
IB  Jm  FraNy?given  a camera  model,  trigonometrically  possible  matches  for  (IA,JA)  will  be  found  on  a 
straight  line  in  picture  B.  This  is  the  “matching  line"  derived  m [Hannah], 

At  the  heart  of  the  point-pair  generator  is  a "region  grower",  which  tries  to  find  matches  for  the 
four  nearest  neighbors  of  every  matched  "target"  area.  It  uses  the  continuity  assumption  to  compute 
the  most  likely  "candidate"  area  for  a given  neighbor:  if  this  is  not  an  acceptable  match,  then  a local 
search  is  initiated.  All  matches  found  by  examining  the  neighbor.'  of  a given  seed  pomt-pair  are  said 
to  belong  to  a "region".  A "region"  is  thus  a portion  of  the  scene  with  no  depth  discontinuities. 

An  alternative  approach,  explored  by  the  JPL  group  [Levine],  dispenser  with  the  overhead  of 
remembering  the  perimeter  of  the  current  region  by  picking  target  areas  in  a uniform  top-to-bottom, 
row-by-row^  manner.  The  point-pairs  of  the  preceding  row  are  used  to  compu  e likely  candidate  areas 
for  the  current  row.  It  is  not  clear  to  the  autJior  which  approach  is  the  more  efficient. 


IV.  Criteria  for  Accepting  a Match. 

The  decision  between  "match"  and  "no  match"  is  the  most  interesting  part  of  point-pair  matching. 

Many  heuristic  techniques  have  been  proposed;  the  author’s  program  uses  the  following: 

1.  Calculation  of  the  variance  of  the  target  area,  rejecting  it  if  below  a value 

This  avoids  making  a "match"  on  the  basis  of  insufficient  information-all  pieces  of 
clear  sky  or  blank  wall  are  indistinguishable  in  the  pictures.^  ^ 

2.  Threshold  rejection  of  the  target  area  on  the  basis  of  i s 

ratio  of  information  content  perpendicular  to  and  paraM  to  the  base  me  direction. 

This  is  also  justified  on  an  informational  argument:  candidate  areas  will  be  se  ected 
along  the  "matching  line"  approximately  parallel  to  the  baseline,  so  that  information  is 
needed  alone  the  baseline  direction  to  discriminate  between  adjacent  target  areas. 

3.  A local  search  in  the  vicinity  of  a high  correlation  match,  to  find  a correlation 

« CompTrUon  o(  lh«  correlation  maximum  with  the  •aulo-correlalion-  ol  lh»  tarjet 
‘ window-the  average  correlation  of  the  target  window  with  itself  whon  shifted  by 
one  pixel  in  each  diction.  This  attempts  to  measure  how  difficult  it  is  to  match  the 
target  window:  if  it  has  a lot  of  "high  frequency"  Information,  the  correlation  peak  will 
be  much  sharper  and  its  maximum  ascertained  much  less  precisely  (much  more 

conservatively). 

The  variance  thresholding  tests  are  lifted  from  Hannah’s  program.  1 have  not  subjected  these  to 
any  tests  of  validity,  although  they  seem  to  do  the  right  thing. 

The  local  search  for  a correlation  maximum  is  intuitively  justifiable;  even  if  a particular  match 
passes  all  significance  tests,  it  is  less  likely  to  be  correct  than  a neighbor  with  a higher  correlation. 

The  auto-correlation  test  is  the  result  of  my  experience  with  correlation  matching.  If  the  highest 
correlation  is  plotted  against  the  auto-correlation  for  each  matchable  target  area,  an 
oattern  emerges:  the  average  correlation  is  just  the  average  of  1.0  and  the  auto-correlation  (see 
Sn  X f^r  * histogram  of  this  relation).  This  may  be  understood  as  the  result  of  an  average  error  of 
a nixel  for  each  match.  Also,  the  probability  of  finding  a correlation  maximum  less  than  the 
S^to-corTefation  is  seen  to  be  quiti  smalL  This  indicates  an  empirical  threshold  value  for  accepting  a 

THRESHOLD  - K + (1-K)» AUTOCORRELATION. 

The  value  of  K can  be  varied  to  make  the  threshold  more  or  less  strict:  K-0  screens  out  only  extremely 
unMkely  correlations,  while  K-.5  will  disallow  half  of  the  good  ones.  The  former  is  appropriate  when 
S’b  a (lobal  sea;ch  tor  a match,  Ih.  object  beinj  to  avoir)  makm,  a mis-m.  ch  wh,  a a ,11  havmj  a 
good  probability  (.5)  of  finding  a match.  The  latter  is  used  to  evaluate  the  results  of  a local  search  for 
a match  to  a target  immediately  adjacent  to  a previously  obtained  good  match. 


A few  "bad"  matches  were  obtained  with  this  threshold  function;  al 
auto-correlations,  and  their  correlation  was  less  than  .5.  The  obvious  fix  was  applied: 

THRESHOLD  - MAX(  K1+<1-K1)*AUTX0RR  , K2). 

This  is  the  correlation  significance  test  used  in  the  author’s  program. 


had  very  low 
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V.  Perspective  Distortion. 

, J =.hr.v«i  for  stereo  matching  using  fixad-size  windows  work  very  well 

The  techniques  d«cr, bed  above  for  This  ideal  condition  will  only  be 

when  objects  Ts  far  from  both  cameras  (relative  to  the 

Observed  wnen  “ Ts  Sel  to  the  baselihe.  A glaoc,  at  the  "barn-  and  "yard-  pairs 

baseline  distance),  ° chould^exoect  significant  deviations  from  ideality.  The  face  of  the  barn 

(section  IX)  Will  show  that  „ i„  ,he  left!  the  log  in  center  left 

appears  nearly  half  again  as  la  g 5 Perspective  distortion  is 

of  the  -yard-  pair  as  a linear  fonchon.  Two  types  ot 

^oliSnsTi'Srrht;  SUe^^fedi  -00.^.  scaling"  and  "directional  scaling". 

Un„or.  scaling  attempts  "»;rir,:fc^arcilt’c7:l*^  fo  ca^.Ta'til 

piece  of  scene  from  j y^e  proper  correction  tor  this  effect  is  obviousi  one  merely 

Will  appear  twice  as  big  _ . ^jj^ension  as  the  candidate  window.  The  relative 

makes  the  target  computed  from  the  location  of  the  proposed  target 

distance,  and  ^ Howe’ver,  the  operation  of  scaling  the  target  window  to  an  arbitrary 

window  using  he  came^  it  slnds.  If  the  candidL  window  is  lUll  pixels,  the  target  window  should 
size  IS  more  difficult  tnan  csamniA  noints  are  still  required  from  each  window  to  do  the 

be  5.5*5.5  pixels  in  ^ ‘ ® a | sample  points  lie  only  on  integral  coordinates;  some  form  of 
X'pol'^ion  ^'r^quired  ?o  avoid  stowing  down  the  correlation  calculation  inordinately,  the  closest 
pixel  is  used  as  the  "interpolated"  value. 

o.  a su  ♦ nictures  available  at  this  time,  the  uniform  scaling  correction  for 

perspe°c"e‘°disTor;ron  has  not  been  adequately 

and  "yard"  pictures  is  1.06  (for  the  fence-post  in  the  barn  picture). 

r,-  a-  1 y.,1  mrrActs  for  the  distortion  due  to  different  orientations  of  the  face  of  an 
Directional  scaling  co  cameras.  Consider  the  plane  formed  by  the  lines  from  each 

object  relative  o ® ® -ag^  Qf  Ks  "matching  area"  (these  lines  are  normally  skewed,  but  nearly 

camera  center  througn  the  ,eiaw  the  lines  so  that  they  intersect  midway 

intersect  at  location  of  small  portion  of  the  face  of  an  object  may  be 

between  ^hose  center  is  at  the  line  intersection,  and  whose  orientation  is  described 

approximated  by  a squ  Jih.Hrai  h«tween  the  olane  and  the  square.  The  ground  is  normally  at 

by  two  angles.  One  angle  . t,,;  uch  as^  and  bains  hav2  dihedral  Ingles  of  around  90 

a small  dihedral  between  the  projection  of  the  surface  normal  of  the  square  onto  the 

degrees.  The  second  angle  ^ intersection.  In  general,  this  "normal"  angle  is 

plane  and  the  line  ® camera.  For  example,  the  door  of  the  barn  (in  the  "barn" 

S':i;,'‘:\:r.dT:bbV8S.bXl«  “ ™wmg  .nsl.  S.  c=m.r.  a,  bv,  o„,y  80  d.gr,.s  .0 
camera  B. 

The  directional  scaling  implemented  in  the  author’s  program  accounts  for  the  relative  size 
The  direct^nai  sc  g p g^g,3j.  g,^gs  j^e 

changes  due  barn  door).  The  apparent  size  ratio  differs  most  from  unity  when  the 

apparent  size  « ^ J®  .^^nge  in  normal  angle  give  a large  ratio  of  cosines.  The 

viewing  angle  ® normal  ancle  is  inversely  proportional  to  the  distance  from  camera  to 

object,  by  simple  8®°'^®*^^;  . j^^oi-mented  by  varying  the  aspect  ratio  of  the  target  window:  on 

or  Inn"  ranSle  Window  is  besll  hatched  by  a 1 U17  target  window.  As  in  uniform 
the  barn  door,  an  11  1 . , , coordinates  for  sample  points  arises,  and  the  same  solution  was 

redundant  comparisons. 

XU  „n;».,ri«  of  thP  correction  for  directional  scaling  cannot  be  predicted  without  knowing  the 
I L !ndin®at  on  of  he  a ^0^^  under  consideration,  so  that  a "search"  must  be  done  for 

angle  of  ,ug  best  correction  factor.  The  time  spent  in  this  search  is  reduced 

each  prospective  PO'"*  P*  . . , * b»y  "continuity",  the  directional  distortion  for  a point-pair  a 

by  the  application  of  two  j^hbori^^^  Second,  if  an  upper  limit  is  placed  on 

fhTm,i"lbd.°  of  no™7argl..  (typically  85  degrpea),  then  th.  npmb.r  of  diff.r.rt  aspect  ratio,  to  be 
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tried  for  the  target  window  is  very  small  (3)  for  point-pairs  corresponding  to  far-away  objects  (100 
baselines)  This  form  of  directional  scaling  correction  is  moderately  successful  in  increasing  correlation 
scores  for  point-pairs  corresponding  to  objects  with  large  normal  angles:  without  it,  less  than  half  of 
the  door  of  the  barn  can  be  matched  (when  thresholds  are  adjusted  to  nearly  eliminate  "false 
matches")  while  nearly  all  is  matchable  using  directional  scaling.  Knowledge  of  normal  angles  shot,  be 
useful  after  the  matching  process,  both  in  discarding  point-pairs  that  don’t  agree  with  their  neighoors, 
and  for  modeling  of  the  3-0  scene  (neither  of  these  schemes  have  been  investigated  at  the  present 

time). 

Correcting  for  distoihon  due  to  dihedral  angle  could  be  implemented  at  a reasonable  cost  in 
running  time-  a square  candidate  window  should  be  matched  to  a parallelogram  target.  This  would  most 
assuredly  help  in  correlating  along  the  ground  plene-in  fact,  moderate  success  could  probably  be 
acheived  by  trying  only  a few  dihedral  angles  for  each  prospective  point-pair.  In  the  barn  pair,  a 
11*1 1 window  of  the  grassy  field  in  the  middle  of  the  'eft  picture  would  be  best  matched  in  the  right 
picture  by  a parallelogram  whose  top  edge  is  skewed  two  pixels  to  the  right  of  the  bottom  edge. 
Correction  for  dihedral  angle  distortion  is,  in  the  author’s  view,  the  most  promising  area  for  future 
research  in  stereo  matching. 


VI.  3-D  Modeling. 

1 i »■  Hahiiooino  the  Doint-oair  matching  routines  described  in  this  report,  the  author 

.non  eit  th^need  fof  a ?onvenien?  means  of  viewing  the  resultant  depth  information.  A program 
soon  felt  ^ ° j ^pable  of  displaying  the  appearance  of  three-dimensional  line 

already  existed  (GEO  . _ . o»neratlon  of  such  drawings  seemed  a trivial  matter.  The  obvious 

drawings  from  all  the  3-space  points  that  correspond  to  neighboring  target 

approach  this  produced  far  too  many  lines  for  GEOMED  to  handle  effectively, 

areaj  in  picture  A.  Unfortuna  y,  P following  rules:  where  a pair  of  linet  cross,  delete 

one  oi  them,  whe  .o-nects  two  "objects",  delete  it.  Of  course,  the  discrimination  ot  objects  is  a 

rn"r!v1,  mTt:e^,^^d^  Thldle^very^uccU  by  the  simple  at^drithm  described  below. 

^ *u-.-,,a*inris:  Each  ooint-oair  produced  by  the  matching  process  corresponds  to 

First  some  ® ' .^YZ)  An  estimate  of  the  eiror  in  the  Z coordinate  can  be  computed  by 

generating  the  (X,Y,Z)c  p g P ^ ^,^h  approximately 

(error  ,n  Z)  - |Z  - Z |.  An  co,ec  mayj  error  in  Z may  be  used  to  give  quantitative 

the  same  va  ue  V ; ■ ^ | « Similarly,^a  "face"  of  an  object  is  an  approximately  planar 

Tolled^ion  of  pomrs  in  the  object.  A reasonable  GEOMED  drawing  can  consist  of  merely  the  perimeters 
of  all  faces. 

Algorithm  for  object  discrimination; 

qort  all  3-D  points  by  their  (IA,JA)  coordinates,  and  draw  lines  between  all  points  that  are 

^•dscen?  n n icture  A (horizontally,  vertically,  and  diagonally).  Assign  the  upper  left-hand  matched  point 
adjacent  in  pictu  extend  the  size  of  the  object  by  including  more  and  more  points 

of  picture  A ' I ii,  correctihr.dMceh'  pdihls).  As  po  nts  accumulste  in  an  object,  evehtually 
(„ov,hg  on ly  .lohg  tt  o oJange  in  7 with  respect  to  changes  in  lA  and  JA 

a surface  pfa  calculated  Hereafter,  the  entrance  requirement  for  points  is  that  their  Z 

coordina  es  can  then  aurlace  plane,  within  a wide  error  bound 

v'rifj'so  t mes  the  estimated  error  in  the  Z coordinate).  When  no  more  points  can  be  included  in 
(lyp.caliy  50  times  the  t upp„.|elt-most  point  that  hasn’t  already  been  included  in  an 

Jbjnl.  Conr^eln'tis  malr  until ‘^.ii  points  have  been  assigned  an  object,  then  delete  lines 

connecting  points  in  different  objects. 

Algorithm  for  face  discrimination: 

First,  all  possible  non-overlapping  triangular  faces  must  be 
..  ThAn  adjacent  trianales  may  be  assigned  to  faces  , starting  with  tne 

in  the  H the  components  of  dZ/dIA  and  dZ/dJA  are  determined  by  the  first 

upper-left-m  aSace"'  succeeding  triangles  must  share  an  edge  and  hence  two  vertices  with  the 

•1^.®"  and  r.  Z ioordinat.'ol  tha  naw  yarla,  must  agraa  with  lha  axlrapolatad  Z coordinat.  within  a 
narrow  error  bound  (typically  twice  the  estimated  error  In  its  Z coordinate). 

The  line  drawing  of  the  "stump"  (see  section  VIII)  was  processed  using  the  above  algorithms. 

C07  anH  P179  lines  in  the  "raw"  drawing,  before  lines  interior  to  faces  and  those 

Thera  were  627 

cpnneclirig  dbfecla  wara  ramdvad,  attar  procac^n^^^  y .ear-most  portion  of  the  axe 

Slndlrand't  rmVSudT".  »nd  com  of  the  ground  on  of 

the  stump.  Clearly,  this  object  discrimination  algorithm  needs  much  refinement. 

In  the  future  better  object  and  face  discrimination  may  form  part  of  a "feedback  loop  that 
hah  noInt-Dairs  predicts  the  appearance  of  the  scene  from  different  viewpoints,  and  zooms 
?n"'on1nlJesting  fea^ure^  (caHs  for  high-resolution  matching  over  limited  regions  such  as  the  stump  m 

the  "yard"  scene). 
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Vll.  Program  Description. 

The  author’s  stereo  vision  program  ,s  divided  into  three  units:  a point-pair  matcher,  a point-pair 
analyzer,  and  a 3-D  modeler. 

The  DOint-pair  matcher  is  named  ZPGROW.  It  asks  the  operator  for  a Data  Disk  overlay  channel 
The  POini  pair  ma  created),  and  for  the  name  of  a pair  of  pictures  and  their 

(for  visual  \ ,^^^ber  of  debugging  routines  are  available.  Typing  "RP"  calls 

associa  ed  camera  model.  At  parameters,  i.e.,  boundaries  and  grid 

spacing  for  the  prospective  target  areas  in  Picture  A.  Point-pairs  are  generated  m the  following 

manner: 

1 (Outer  loop)  All  unmatched  points  in  picture  A lying  within  entered  boundary  are 
given  to  step  2.  Points  near  the  center  of  picture  A are  tried  first.  Points  with 
iinafceDtable  variance  are  discarded  immediately.  , 

2.  ^Global  match)  The  "matching  line"  in  picture  B is  calculated  for  the 

Dicture  A then  every  third  point  along  this  line  is  used  as  the  center  of  a candidate 
area  No ’search  is  made  for  directional  scaling  factors.  All  correlations  above  a 
threshold  are  sorted-,  the  three  highest  scoring  point-pairs  are  passed  ^®P 

3 (Hill-climb)  A three-dimensional  hill-climb  is  performed,  varying  B,JB,  and  directional 
scaling  factor  to  optimize  the  correlation  score.  The  (!B.JB)  coordinates  are 
constrained  to  lie  withi.i  a certain  distance  from  the  computed  matching  line 
(typically  1 pixel).  The  directional  scaling  factor  is  bounded  as  described  in  section  IV. 

If  thriptimum  correlation  is  above  the  threshold,  the  corresponding  point-pair  is 

4 (^R*'elil^n 'queue)  The  target  and  candidate  areas  are  marked  GOOD,  and  the  point-pair 
■ is  saveo^  on  a disk  file.  Four  neighboring  poirt-pairs  are  extrapolated  from  the 

oarameters  of  the  matching  one,  and  entered  on  a FIFO  queue.  Start  step  5. 

5 ^Region  grower)  If  the  region  queue  is  empty,  go  to  step  6.  Otherwise,  a point-pair  is 
' taken  of?  the  queue  and  examined.  If  it  has  not  been  matched,  and  its  variance  is  OK, 

its  correlation  is  checked  against  the  correlations  of  4 neighboring  target  areas  and  2 
different  directional  scaling  factors.  If  it  is  not  a loca,  maxirnum,  the  target  area  that 
had  a higher  correlation  is  checked  in  the  same  manneri  this  hill-climbing  is  bounded 
by  the  same  constraints  as  in  step  3.  If  a correlation  peak  is  found  that  exceeds  a 
threshold,  that  point-pair  is  given  to  step  4.  If  no  correlation  peak  is  ound,  the 
point-pair  is  put  on  the  mis-mstch  queue  (this  is  only  done  once  for  any  target  area). 

If  a su^b-threshold  peak  is  found,  it  is  so  marked:  it  will  not  be  checked  again  m this 

run  of  the  region  grower.  Continue  step  5.  . , i 

6 (lv<is-match  retry)  If  the  mis-match  queue  is  empty,  continue  the  outer  loop.  Otherwise, 
take  a point-pair  off  the  mis-match  queue  and  give  it  to  step  3. 

The  lack  of  finesse  in  the  "outer  loop"  should  be  obvious:  this  brute  force  technique,  while  slow. 
The  lacK  o , . matching  as  possible.  The  inner  loops  gam  much  speeo  from  an 

results  in  near  y as  p . j ^ png  of  8 states:  virgin,  matched,  bad  variance, 

foop)  i^due  to  Hannah  The  anthor  merely  "patched  in"  perspective  distortion  corrections,  a new 
correlation  threshold,  and  the  cuter  loop. 

Correlations  are  calculated  in  a "direct"  way.  that  is.  computing  the  squared  sum  of  the  errors  (a 
fitinnai  tricks  ereativ  speed  this  process).  As  is  commonly  known,  the  FFT  may  be  used  to 
few  computational  t ^ N*log(N)  time,  as  opposed  to  the  N*N  time  for  the  direct  method:  however,  in 
compute  tradeoff  point  between  increased  overhead  and  poorer  asymptotic 

be’h,v  or  h,  not  6..n  vnohed  or  iS  comout.r  (,  PDP-10)  [H.nn.hl  Althoogh  gre.l  care  w.  s taken 
^ a,.  ?or?eUtion  calculation,  it  still  accounts  for  most  of  the  running  time  of  the  p-ogram. 

Levine  [Levine]  reports  the  use  of  relatively  inexpensive  hardware  that  speeds  this  calculatioi.  by  a 
factor  of  ten  or  more:  still  further  increases  in  speeu  would  be  possible  with  specially-designed 

hardware  assistance. 

The  DOint-pair  analyzer,  MANNA,  takes  the  disk  file  of  point -paiis  created  by  the  automatic 
point-pair  gener^r.  sorts  them  by  lA  or  JA  coordinates,  and  creates:  1)  a listing  file  of  the 
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Vlil.  Conclusions. 


This  project  has  demonstrated  the  feasibility  of  perspective  distortion  correction  in  comp.t<.i 
stereo  vision.  Much  more  work  needs  to  be  done  in  this  area,  including: 

1.  Correction  for  dihedral  distortion,  as  defined  in  section  V,  to  facilitate  matching  along 
the  ground  plane. 

2.  Implementation  of  a termination  test  to  avoid  the  fruitless  global  searching  obtained  in 
the  "stump"  pair  matching  (see  section  IX), 


3.  Refinements  on  the  "face"  and  "object"  discrimination  algorithms  of  section  VI. 


4 Acquisition  of  special  purpose  hardware  to  speed  up  correlation  calculations.  The 
PDP-10  used  for  this  research  takes  milliseconds  to  compute  a correlation; 
order-of-magnitude  improvements  are  quite  possible  with  present-day  technology. 

Even  given  all  these  improvements,  "real  time"  computer  depth  perception  lies  far  in  the  future, 
judging  by  the  size  of  the  gap  between  the  best  current  efforts  and  the  performance  of  humans. 
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IX.  Illustrations. 


Th«  illustrations  below  are  snapshots  of  the  monitor  ("television")  output  produced  by  ZPGROW 
T tc  n s Ireo  D ctures  w as  input  data:  the  "barn"  and  "yard"  pair.  These  were  digitized 

IrSigh  resolution,  then  spatially  averaged  to  result  in  the  reduced,  yard  and  barn;  the 

"stur^p"  pair  is  just  an  enlarged  version  of  the  stump  in  the  middle  of  the  yard  pair. 


.suit  of  i complete  matching  on  the  "barn"  pair  (on  a b*P  grid;,  wr  ie  spo  s are  a ne 
etching  areas;  336  point-pairs  were  generated  in  3 minutes  of  CPU  time.  Note  that  he 
•ide  to  the  right  of  the  barn  is  correctly  matched;  there  is  a large  region  visible  in  the 
lidure  that  is  occluded  by  the  barn  in  the  left-hand  one.  The  barbed-wire  fence  in  extreme 
presents  an  obstacle  to  the  matching  of  areas  of  the  grassy  field  in  middle-ground.  The 
pacing  of  the  dots  is  much  wider  on  the  barn  door  in  the  right  picture  than  the  left, 
moderate  amount  Of  perspective  distortion. 


ilete  matching  on  the  "yard"  pair,  again  on  a 5»5  grid  spacing;  poin  -pairs  were 
8 5 minutes.  Nearly  all  the  point-pairs  were  obtained  from  a single  global  match.  The 
. spent  in  finding  each  point-pair,  relative  to  that  observed  for  the  barn  , is  due  to  two 
very  little  time  was  spent  on  the  sky  of  the  "barn",  as  its  variance  was  too  low;  second 
ir  is^ half  again  as  wide  as  the  "barn"  pair,  so  that  the  global  match  (step  2 in  sechoi  Vll) 
Most  of  this  pair  is  correctly  matched,  except  for  some  low  variance  areas  o dir  (in 
j)  and  trees  (in  background).  The  two  stumps  in  the  foreground  are  the  most  interesting 
to  laree  perspec  ive  distortion. 


A o,  lion -STs'l  S ,lf^  iTclVs 

Most  of  (® 2 5"  Cleady,  a more  intelligent  termination  algor, rhm  would  be  helpful. 

„e,o  '““"f;" ^ „r,h.  a»  hand!,  on  top  of  lha  atump,  a.  ,1  is,  on  y a ,aw 

Joints  of  it  Ta  Jocatad.  A larga  amount  of  dihadral  distortion  pravants  lha  accurata  matching  of 
more  of  the  ground. 

The  results  of  the  "stump"  pair  matching  above  were  fed  to  MANNA  and  MKB3D;  GEOMED  was 
^ Hpnict  a line  drawing  of  the  stump  as  viewed  from  sixteen  orientations,  reproduced  on  the 
used  to  ® l^seen  "head  on"  from  Camera  A (see  the  left  hand  picture  of  the 

next  page.  At  top  le  , ^ loft  to  right,  top  to  bottom,  the  camera  ia  raised  in  an  arc 

"stump  P®'’’;  ip  the  bottom  right  picture  the  camera  is  directly  above  the  stump.  In 

centered  at  the  stump,  so  that  in  the  bottom  ^ 

this  last  picture,  a portion  of  the  J portion  of  each  picture,  a cluster  of 

position,  or  remains  nearly  stationary  throughout  the 

matching  pomt-pairs  at  ^ 3 -bow  tie"  in  the  views  at  the  top 

oMhroagrtr’a'largl  trapezoid  in  the  bottom  views.  A portion  of  the  ground  to  the  right  of  the 
Jtump  JonJaiJs  vdrtLlly  from  Ihe  first  to  Ih,  last  picture,  evenlually  becoming  partially  occluded 
by  the  far  end  of  the  axe  handle  in  the  last  picture. 


'I 


X.  Correlation  / Auto-correlation  Histogram. 

This  histogram  was  prepared  form  the  results  of  complete  matchings  of  the  "yard",  "barn",  and 
"stump"  pairs,  counting  the  total  number  of  good  matches  found  for  ? given  range  of  auto-correlation 
and  correlation  values.  For  example,  55  matches  of  correlation  .95  to  1.0  were  found  for  windows  of 
auto-correlation  from  .90  to  .95. 

As  matches  are  deemed  "good"  and  thus  included  in  the  histogram  only  if  they  pass  the  author’s 
correlation  test,  the  sample  is  biased.  However,  98t(  of  the  matches  were  found  in  the  inner  loop  of  the 
"reeion  grower"  where  the  correlation  threshold  is  quite  low: 

® * THRESHOLD  - MAX(.51,AUT0C0RR). 

This  explains  the  absence  of  matches  whose  correlation  is  less  than  the  auto-correlation,  but  the  trend 
in  each  row  indicates  that  not  many  good  matches  would  be  found  there  anyway  The  number  of 
matches  found  for  a given  auto-correlation  seems  to  peak  where  the  correlation  - (l+AUTOCORR)/2. 
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